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AMENDMENTS TO THE CLAIMS: 

This listing of claims will replace all prior versions, and listings, of claims in the 
application: 
Listing of Claims: 

1. (currently amended) A method for predicting a polyadenylation site comprising: 
inputting a plurality of RNA transcript sequences or sequences dcrvi e d form RNA 
transcript sequences, wherein at least one sequence has its poly A or poly T tract 
sequence; 

searching for a polyadenylation site, wherein the polyadenylation is an adenine rich 
region at the 31 end of the sequence or a thymine rich region at the_51 end of the 
sequence; 

detecting the presence of polyadenylation signals neighboring the polyadenylation site by 
ooanning analyzing [[the]] EST or RNA sequences or their corresponding genomic DNA 
sequences. 

2. (currently amended) The method of Claim 1 wherein the step of searching for a 
polyadenylation site comprising comprises e eanning analyzing the sequences for adenine 
rich region at the 3>nd of the sequence or thymine rich region at the 51 end of the 
sequence. 
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3. (original) The method of Claim 2 wherein the adenine rich region comprises 
adenine in at least 50% of the region and the thymine region comprises thymine in at 
least 50% of the region. 

4. (original) The method of Claim 2 wherein the adenine rich region comprises 
adenine in at least 60% of the region and the thymine region comprises thymine in at 
least 60% of the region. 

5. (original) The method of Claim 2 wherein the adenine rich region comprises 
adenine in at least 70% of the region and the thymine region comprises thymine in at 
least 70% of the region. 

6. (original) The method of Claim 2 wherein the adenine rich region comprises 
adenine in at least 80% of the region and the thymine region comprises thymine in at 
least 80% of the region, 

7. (currently amended) The method of Claim 1 wherein a heuristic score n A / (n A + 
0.5*(max(n R -20,0))) is used for detecting adenine rich or thymine rich region; wherein n A 
is the number of adenines or thymines in the block, and n R is the number of bases afte* 
downstream of the block of adenines or thymines to the end of the sequence. 
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8. (currently amended) A method for detecting polyadenylation signal in a sequence 
with a polyadenylation site comprising searching for a polyadenylation signal hexamer in 
the sequence befef© 51 of the polyadenylation site. 

9. (currently amended) The method of Claim 8 wherein the searching comprises 
evaluating the probability that there is a polyadenylation [[site}] signal: Pr(h=k/x) for 
k=6,7,. . . ,N, wherein the sequence before 5' of the polyadenylation site is x=(*i, x 2 „ . . , 
x N ) and where x N is the most 3 f -most base b e for e upstream of the polyadenylation site. 

10. (original) The method according to claim 9 wherein Pr(h=k/x) = Pr(x/h=k) 
Pr(h=k)/Pr(x). 

11. (original) The method of Claim 10 wherein Pr(h=k/x) = Pr (x k . 5 , .-.,x k /h=k) 
Pr(h=k)/Pr(xic.5, .AO and wherein Pr(h=k) is the probability that the polyadenylation 
hexamer is located at the position k in the sequence, at a distance (N-k) from the 

polyadenylation site, Pr (x k _ 5 Xk/h=k) is the probability of observing the hexamer (x w , 

...,x k ) given that it is a polyadenylation signal and Pr (x k . 5 , *..,Xk/fc#e) is the probability 
of observing the hexamer thai it is not from the polyadenylation signal . 

12. (currently amended) The method of Claim 1 1 wherein the step of detecting 
comprises using a gamma function to produce a density distribution which places the 
majority of its weight on the position located 5 to 25 bases distant form the 
polyadenylation site. 
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13- (original) The method of Claim 12 wherein Pr (x k . 5 , ,..^jc/h^k)> the probability of 
observing the hexamer given that it is not from the polyadenylation signal, is modeled 
using a second-Markov model trained on data collected from human 3 s UTRs. 

14. (original) The method of Claim 13 wherein Pr (x k . 5 , - . .,*k/h£k) = Prfrics) P*(x k _ 
o/xfe-s) Pri^x^M) PKx^xm^m) Pr(x k .i/x k .3^ k .2) Prfo/x^k-i) wherein the first 
term is a zero-order Markovian probability, the second is a first-order Markovian 
probability and the remaining four terms are second order Markovian probabilities, 

15. (original) The method of Claim 14 wherein, for a k* -order Markov model, the 
probability of base b following a word w of length k is estimated by the frequency of the 
concatenated word (wb) divided by the frequency of the word w, where frequencies are 
computed from the training datasets of 3 7 UTRs sequences. 

16. (original) The method of Claim 15 wherein, for the case k = 0 (a zero order 
Markovian model), the probability of base b is estimated by its frequency in the daiaset 
divided by the size of the dataset. 

17. (currently amended) A computer readable medium comprising computer- 
executable instructions for performing the method comprising: 

inputting a plurality of RNA transcript sequences or soquoncoo derived form RNA 
tr anscript sequ e nc e s , wherein at least one sequence has its poly A or poly T tract 
sequence; 
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searching for a polyadenylation site, wherein the polyadenylation is an adenine rich 
region at the 31 end of the sequence or a thymine rich region at the_51 end of the 
sequence; 

detecting the presence of polyadenylation signals neighboring the polyadenylation site by 
scanning analyzing [[the]] EST or RNA sequences or their corresponding genomic DNA 
sequences. 

18. (currently amended) The computer readable medium of claim 17 wherein the step 
of searching for a polyadenylation site comprising comprises scanning analyzing the 
sequences for adenine rich region at the 31end of the sequence or thymine rich region at 
the 51 end of the sequence, 

19. (original) The computer readable medium of Claim 18 wherein the adenine rich 
region comprises adenine in at least 50% of the region and the thymine region comprises 
thymine in at least 50% of the region. 

20. (original) The computer readable medium of Claim 19 wherein the adenine rich 
region comprises adenine in at least 60% of the region and the thymine region comprises 
thymine in at least 60% of the region. 

21 . (original) The computer readable medium of Claim 20 wherein the adenine rich 
region comprises adenine in at least 70% of the region and the thymine region comprises 
thymine in at least 70% of the region. 

6 



PA(£ 7/14 ' RCVD AT 6/15/2005 5:20:57 PM [Eastern Daylight Time] * SVROTO-EFXM * DNIS:8729306 * CSID:4087315M2 1 DURATION (mm-ss):03-16 



JUn-15-05 01:24pm Frora-Affymetrix, Inc. 408 7315392 T-093 P. 008/014 F-323 

Application No.; 10/028,416 

22. (original) The computer readable medium of Claim 21 wherein the adenine rich 
region comprises adenine in at least 80% of the region and the thymine region comprises 
thymine in at least 80% of the region. 

23. (currently amended) The computer readable medium of Claim 17 wherein a 
heuristic score n A / (n A + 0.5*(max(n R -20,0))) is used for detecting adenine rich or 
thymine rich region; wherein n A is the number of adenines or thymines in the block, and 
n R is the number of bases [[after]] downstream of the block of adenines or thymines to 
the end of the sequence, 

24. (currently amended) A computer readable medium comprising computer 
executable instructions for performing the method comprising searching for a 
polyadenylation signal hexamer in the sequence befor e 51 of the polyadenylation site . 

25. (currently amended) The computer readable medium of Claim 24 wherein the 
searching comprises evaluating the probability that there is a polyadenylation [[site]] 
signal : Pr(h=k/x) for k=6,7, . . M N, wherein the sequence b e for e 5» of the polyadenylation 
site is x=(xi, x^,..., xn) and where xn is the most 3'-most base before upstream of the 
polyadenylation site. 

26. (original) The method according to claim 25 wherein: Pr(h=k/x) = Pr(x/h=k) 
Pr<h=k)/Pr(x). 
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27. (original) The computer readable method of Claim 26 wherein: Pr<h=k/x) = Pr 
(x k .5, ...^k/h=k)Pr(h=k)/Pr(x k .5, ..-ptOand wherein Pr(h=k) is the probabiUty that the 
polyadenylation hexamer is located at the position k in the sequence, at a distance (N-k) 
from the polyadenylation site, Pr (x^» ■ • -At/h-k) is the probability of observing the 
hexamer (x k -5, . . .,*0 given that it is a polyadenylation signal and Pr (xw, . . . ,x*/h^k) is 
the probability of observing the hexamer that it is not from the polyadenylation signal. 

28. (currently amended) The computer readable medium of Claim 27 wherein the 
step of detecting comprises using a gamma function to produce a density distribution 
which places the majority of its weight on the position located 5 to 25 bases distant form 
the polyadenylation site. 

29. (original) The computer readable medium of Claim 28 wherein Pr (Xic- 3s 

. . . r x k /h#k), the probability of observing the hexamer given that it is not from the 
polyadenylation signal, is modeled using a second-Markov model trained on data 
collected from human 3 7 UTRs. 

30. (original) The computer readable medium of Claim 29 wherein Pr 
...Xitefk) = Prfrie-s) Pr(x k Vxk-5) Pr(x k .3/X3c. 5 ^k-4) Pr(x k .2/x M >x k _3) Pr(x k .i/x k -3^ k .2) 
Pr(Xk/Xk-2,x k .i) wherein the first term is a zero-order Maxkovian probability, the second is 
a first-order Markovian probability and the remaining four terms are second order 
Markovian probabilities. 
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3 1 . (original) The computer readable medium of Claim 30 wherein, for a k to -order 
Markov model, the probability of base b following a word w of length k is estimated by 
the frequency of the concatenated word (wb) divided by the frequency of the word w, 
where frequencies are computed from the training datasets of 3'UTRs sequences. 

32. (original) The computer readable medium of Claim 31 wherein, for the case k = 
0 (a zero order Markovian model), the probability of base b is estimated by its frequency 
in the dataset divided by the size of the dataset. 

33. (currently amended) A system comprising a processor; and a memory coupled 
with the processor, the memory storing a plurality of machine instructions that cause the 
processor to perform logical steps of the method comprising: 

inputting a plurality of RNA transcript sequences or Expressed Sequence Tags (EST) 
soquoacoo d e rived form UNA transcript sequenc e s , wherein at least one sequence has its 
poly A or poly T tract sequence; 

searching for a polyadenylation site, wherein the polyadenyiation is an adenine rich 
region at the 31 end of the sequence or a thymine rich region at the_51 end of the 
sequence; 

detecting the presence of polyadenylation signals neighboring the polyadenylation site by 
scanning analyzing [[the]] EST or RNA sequences or their corresponding genomic DNA 
sequences. 
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34. (currently amended) The system of claim 33 wherein the step of searching for a 
polyadenylation site comprising comprises scanning analyzing the sequences for adenine 
rich region at the 31end of the sequence or thymine rich region at the 51 end of the 
sequence. 

35. (original) The system of Claim 34 wherein the adenine rich region comprises 
adenine in at least 50% of the region and die thymine region comprises thymine in at 
least 50% of the region. 

36. (original) The system of Claim 35 wherein the adenine rich region comprises 
adenine in at least 60% of the region and the thymine region comprises thymine in at 
least 60% of the region. 

37. (original) The system of Claim 36 wherein the adenine rich region comprises 
adenine in at least 70% of the region and the thymine region comprises thymine in at 
least 70% of the region. 

38. (original) The system of Claim 37 wherein the adenine rich region comprises 
adenine in at least 80% of the region and the thymine region comprises thymine in at 
least 80% of the region, 

39. (currently amended) The system of Claim 33 wherein a heuristic score n A / (n A + 
0.5*(max(nR-20,0))) is used for detecting adenine rich or thymine rich region; wherein n A 
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is the number of adenines or thymines in the block, and n R is the number of bases [[after]] 
downstream of the block of adenines or thymines to the end of the sequence. 

40. (currently amended) A system comprising a processor; and a memory coupled 
with the processor, the memory storing a plurality of machine instructions that cause the 
processor to perform logical steps the method for detecting polyadenylation signal in a 
sequence with a polyadenylation site comprising: searching for a polyadenylation signal 
hexamer in the sequence befor e 51 of the polyadenylation site. 

41 . (currently amended) The system of Claim 40 wherein the searching comprises 
evaluating the probability that there is a polyadenylation [[site]} signal : Pr(h=k/x) for 
te=6/7,...*N, wherein the sequence before fTof the polyadenylation site is x=(xi, x 2 ,..., 
x N ) and where x N is the most 3'-most base b efore upstream of the polyadenylation site. 

42. (original) The system of Claim 41 wherein; Pr(h=k/x) = Pr(x/h=k) Pr(h=k)/Pr(x). 

43. (original) The system of Claim 42 wherein Pr(h=k/x) = Pr (x w , ..„Xk/h=k) 

Pr(h=k)/Pt(x k .5 Xk) and wherein Pr(h=k) is the probability that the polyadenylation 

hexamer is located at the position k in the sequence, at a distance (N-k) from the 
polyadenylation site, Pr (x k - 5 , . . . ,x*/h=k) is the probability of observing the hexamer (x k _ 57 
- - AD given that it is a polyadenylation signal and Pr (x k _s, .*yfh£k) is the probability 
of observing the hexamer that it is not from the polyadenylation signal. 
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44. (currently amended) The system of Claim 43 wherein the step of detecting 
comprises using a gamma function to produce a density distribution which places the 
majority of its weight on the position located 5 to 25 bases distant form the 
poJyadenylation site. 

45. (original) The system of Claim 44 wherein Pr (x k - 5 , ...^fc/h^k), the probability of 
observing the hexamer given that it is not from the polyadenylation signal, is modeled 
using a second-Markov model trained on data collected from human S'UTRs. 

46. (original) The system of Claim 45 wherein Pr (x k _ 5 , . . . ,Xk/Mc) - Pr(x k . 5 ) PKx*. 
4/x k - 5 ) Pr(x k .3/x k -5 7 x fc -4) Pr(x k . 2 /x*-4 f x k .3) Pr(xic.i/x k . 3 ,xic.2) Prtxk/xv^k-i) wherein the first 
term is a zero-order Markovian probability, the second is a first-order Markovian 
probability and the remaining four terms are second order Markovian probabilities. 

47. (original) The system of Claim 46 wherein, for a k A -order Markov model, the 
probability of base b following a word w of length k is estimated by the frequency of the 
concatenated word (wb) divided by the frequency of the word w, where frequencies are 
computed from the training datasets of 3'UTRs sequences. 

48. (original) The system of Claim 47 wherein, for the case k = 0 (a zero order 
Markovian model), the probability of base b is estimated by its frequency in the dataset 
divided by the size of the dataset. 
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